Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE J Biomed Health Inform ; 28(4): 2408-2415, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38319781

RESUMO

In bioinformatics, protein function prediction stands as a fundamental area of research and plays a crucial role in addressing various biological challenges, such as the identification of potential targets for drug discovery and the elucidation of disease mechanisms. However, known functional annotation databases usually provide positive experimental annotations that proteins carry out a given function, and rarely record negative experimental annotations that proteins do not carry out a given function. Therefore, existing computational methods based on deep learning models focus on these positive annotations for prediction and ignore these scarce but informative negative annotations, leading to an underestimation of precision. To address this issue, we introduce a deep learning method that utilizes a heterogeneous graph attention technique. The method first constructs a heterogeneous graph that covers the protein-protein interaction network, ontology structure, and positive and negative annotation information. Then, it learns embedding representations of proteins and ontology terms by using the heterogeneous graph attention technique. Finally, it leverages these learned representations to reconstruct the positive protein-term associations and score unobserved functional annotations. It can enhance the predictive performance by incorporating these known limited negative annotations into the constructed heterogeneous graph. Experimental results on three species (i.e., Human, Mouse, and Arabidopsis) demonstrate that our method can achieve better performance in predicting new protein annotations than state-of-the-art methods.


Assuntos
Biologia Computacional , Proteínas , Humanos , Animais , Camundongos , Biologia Computacional/métodos , Mapas de Interação de Proteínas , Anotação de Sequência Molecular , Bases de Dados Factuais
2.
Artigo em Inglês | MEDLINE | ID: mdl-38422367

RESUMO

OBJECTIVE: Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical natural language processing (NLP) tasks in different languages, we present Taiyi, a bilingual fine-tuned LLM for diverse biomedical NLP tasks. MATERIALS AND METHODS: We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, these corpora were converted to the instruction data used to fine-tune the general LLM. During the supervised fine-tuning phase, a 2-stage strategy is proposed to optimize the model performance across various tasks. RESULTS: Experimental results on 13 test sets, which include named entity recognition, relation extraction, text classification, and question answering tasks, demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multitasking. CONCLUSION: Leveraging rich high-quality biomedical corpora and developing effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multitasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches using smaller language models.

3.
IEEE Trans Nanobioscience ; 22(4): 755-762, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37204950

RESUMO

Gene Ontology (GO) is a widely used bioinformatics resource for describing biological processes, molecular functions, and cellular components of proteins. It covers more than 5000 terms hierarchically organized into a directed acyclic graph and known functional annotations. Automatically annotating protein functions by using GO-based computational models has been an area of active research for a long time. However, due to the limited functional annotation information and complex topological structures of GO, existing models cannot effectively capture the knowledge representation of GO. To solve this issue, we present a method that fuses the functional and topological knowledge of GO to guide protein function prediction. This method employs a multi-view GCN model to extract a variety of GO representations from functional information, topological structure, and their combinations. To dynamically learn the significance weights of these representations, it adopts an attention mechanism to learn the final knowledge representation of GO. Furthermore, it uses a pre-trained language model (i.e., ESM-1b) to efficiently learn biological features for each protein sequence. Finally, it obtains all predicted scores by calculating the dot product of sequence features and GO representation. Our method outperforms other state-of-the-art methods, as demonstrated by the experimental results on datasets from three different species, namely Yeast, Human and Arabidopsis. Our proposed method's code can be accessed at: https://github.com/Candyperfect/Master.


Assuntos
Arabidopsis , Proteínas , Humanos , Ontologia Genética , Proteínas/genética , Proteínas/metabolismo , Semântica , Biologia Computacional/métodos , Arabidopsis/genética , Arabidopsis/metabolismo , Anotação de Sequência Molecular
4.
IEEE J Biomed Health Inform ; 27(2): 1140-1148, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-37022395

RESUMO

Proteins are the main undertakers of life activities, and accurately predicting their biological functions can help human better understand life mechanism and promote the development of themselves. With the rapid development of high-throughput technologies, an abundance of proteins are discovered. However, the gap between proteins and function annotations is still huge. To accelerate the process of protein function prediction, some computational methods taking advantage of multiple data have been proposed. Among these methods, the deep-learning-based methods are currently the most popular for their capability of learning information automatically from raw data. However, due to the diversity and scale difference between data, it is challenging for existing deep learning methods to capture related information from different data effectively. In this paper, we introduce a deep learning method that can adaptively learn information from protein sequences and biomedical literature, namely DeepAF. DeepAF first extracts the two kinds of information by using different extractors, which are built based on pre-trained language models and can capture rudimentary biological knowledge. Then, to integrate those information, it performs an adaptive fusion layer based on a Cross-attention mechanism that considers the knowledge of mutual interactions between two information. Finally, based on the mixed information, DeepAF utilizes logistic regression to obtain prediction scores. The experimental results on the datasets of two species (i.e., Human and Yeast) show that DeepAF outperforms other state-of-the-art approaches.


Assuntos
Proteínas , Saccharomyces cerevisiae , Humanos , Proteínas/metabolismo , Sequência de Aminoácidos , Saccharomyces cerevisiae/metabolismo
5.
IEEE/ACM Trans Comput Biol Bioinform ; 18(4): 1439-1450, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-31562099

RESUMO

Protein function prediction is a fundamental task in the post-genomic era. Available functional annotations of proteins are incomplete and the annotations of two homologous species are complementary to each other. However, how to effectively leverage mutually complementary annotations of different species to further boost the prediction performance is still not well studied. In this paper, we propose a cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW). AsyRW first constructs a heterogeneous network to integrate multiple functional association networks derived from different biological data, established homology-relationships between proteins from different species, known annotations of proteins and Gene Ontology (GO). To account for the intrinsic structures of intra- and inter-species of proteins and that of GO, AsyRW quantifies the individual walk lengths of each network node using the gravity-like theory, and then performs asynchronous-random walk with the individual length to predict associations between proteins and GO terms. Experiments on annotations archived in different years show that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods. The codes of AsyRW are available at: http://mlda.swu.edu.cn/codes.php?name=AsyRW.


Assuntos
Biologia Computacional/métodos , Proteínas , Animais , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Proteínas/química , Proteínas/fisiologia , Processos Estocásticos
6.
Front Genet ; 11: 400, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32391061

RESUMO

Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.

7.
Genomics ; 111(3): 334-342, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-29477548

RESUMO

Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash.


Assuntos
Ontologia Genética , Software , Animais , Humanos , Camundongos , Ratos , Semântica
8.
IEEE/ACM Trans Comput Biol Bioinform ; 15(4): 1390-1402, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-28641268

RESUMO

A remaining key challenge of modern biology is annotating the functional roles of proteins. Various computational models have been proposed for this challenge. Most of them assume the annotations of annotated proteins are complete. But in fact, many of them are incomplete. We proposed a method called NewGOA to predict new Gene Ontology (GO) annotations for incompletely annotated proteins and for completely un-annotated ones. NewGOA employs a hybrid graph, composed of two types of nodes (proteins and GO terms), to encode interactions between proteins, hierarchical relationships between terms and available annotations of proteins. To account for structural difference between GO terms subgraph and proteins subgraph, NewGOA applies a bi-random walks algorithm, which executes asynchronous random walks on the hybrid graph, to predict new GO annotations of proteins. Experimental study on archived GO annotations of two model species (H. Sapiens and S. cerevisiae) shows that NewGOA can more accurately and efficiently predict new annotations of proteins than other related methods. Experimental results also indicate the bi-random walks can explore and further exploit the structural difference between GO terms subgraph and proteins subgraph. The supplementary files and codes of NewGOA are available at: http://mlda.swu.edu.cn/codes.php?name=NewGOA.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Humanos , Proteínas/genética , Proteínas de Saccharomyces cerevisiae/genética
9.
Comput Biol Chem ; 71: 264-273, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29031869

RESUMO

Gene ontology (GO) is a standardized and controlled vocabulary of terms that describe the molecular functions, biological roles and cellular locations of proteins. GO terms and GO hierarchy are regularly updated as the accumulated biological knowledge. More than 50,000 terms are included in GO and each protein is annotated with several or dozens of these terms. Therefore, accurately predicting the association between proteins and massive GO terms is rather challenging. To accurately predict the association between massive GO terms and proteins, we proposed a method called Hashing GO for protein function prediction (HashGO in short). HashGO firstly adopts a protein-term association matrix to store available GO annotations of proteins. Then, it tailors a graph hashing method to explore the underlying structure between GO terms and to obtain a series of hash functions to compress the high-dimensional protein-term association matrix into a low-dimensional one. Next, HashGO computes the semantic similarity between proteins based on Hamming distance on that low-dimensional matrix. After that, it predicts missing annotations of a protein based on the annotations of its semantic neighbors. Experimental results on archived GO annotations of two model species (Yeast and Human) show that HashGO not only more accurately predicts functions than other related approaches, but also runs faster than them.


Assuntos
Ontologia Genética , Proteínas/metabolismo , Bases de Dados de Proteínas , Humanos , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...